Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUTO] Fix the timing issue in AUTO inference #27290

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

yangwang201911
Copy link
Contributor

Details:

  • Added synchronization

Tickets:

  • 153629

@github-actions github-actions bot added the category: AUTO OpenVINO AUTO device selection plugin label Oct 29, 2024
@yangwang201911 yangwang201911 marked this pull request as ready for review November 13, 2024 01:39
@yangwang201911 yangwang201911 requested a review from a team as a code owner November 13, 2024 01:39
@yangwang201911 yangwang201911 changed the title Ywang2/fix the timing issue [AUTO] Fix the timing issue in AUTO inference Nov 14, 2024
std::unique_lock<std::mutex> lck(worker_infer_mutex);
if (!idle_workerrequests.try_pop(worker)) {
idle_workerrequests_cv.wait(lck, [&idle_workerrequests, &worker] {
return idle_workerrequests.try_pop(worker);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, this solution seems to wait forever until it can get a worker, this is not by design(by design we have m_infer_pipeline_tasks for tasks which are not able to schedule workers at some moment).
maybe we can consider increase cpu worker infer request to 2 to avoid this deadlock, or if we need to use cv to fix this issue, at least can we try align with the m_infer_pipeline_tasks design?

// This is necessary to handle the case where a request worker is popped from the idle queue before being pushed back.
// Without at least 2 requests, there could be a situation where no requests are available for inference,
// leading to potential deadlocks.
num_requests = num_requests <= 1 ? 2 : num_requests;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's trigger local CI tests for auto to ensure no regression

(m_context->m_device_priorities.end() == it_numrequests || it_numrequests->num_requests_per_devices == -1)
? optimal_num
: it_numrequests->num_requests_per_devices;
num_requests = num_requests <= 1 && m_context->m_performance_hint == ov::hint::PerformanceMode::THROUGHPUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why tput here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline earlier, not having this option will cause a hang.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that was about cumulative tput, is not it? otherwise, how can you expect this PR to fix the customer issue which is in latency mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I got it. will debug why it caused a hang without this option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: AUTO OpenVINO AUTO device selection plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants